A Computational Grammar for Georgian

نویسنده

  • Paul Meurer
چکیده

In this paper, I give an overview of an ongoing project which aims at building a full-scale computational grammar for Georgian in the Lexical Functional Grammar framework and try to illustrate both practical and theoretical aspects of grammar development. The rich and complex morphology of the language is a major challenge when building a computational grammar for Georgian that is meant to be more than a toy system. I discuss my treatment of the morphology and show how morphology interfaces with syntax. I then illustrate how some of the main syntactic constructions of the language are implemented in the grammar. Finally, I present the indispensable tools that are used in developing the grammar system: fst; the xle parsing platform, the LFG Parsebanker, and a large searchable corpus of non-fiction and fiction texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Finite-State Model of Georgian Verbal Morphology

Georgian is a less commonly studied language with complex, non-concatenative verbal morphology. We present a computational model for generation and recognition of Georgian verb conjugations, relying on the analysis of Georgian verb structure as a word-level template. The model combines a set of finite-state transducers with a default inheritance mechanism.1

متن کامل

Polynomial Pregroup Grammars parse Context Sensitive Languages

Pregroup grammars with a possibly infinite number of lexical entries are polynomial if the length of type assignments for sentences is a polynomial in the number of words. Polynomial pregroup grammars are shown to generate the standard mildly context sensitive formal languages as well as some context sensitive natural language fragments of Dutch, SwissGerman or Old Georgian. A polynomial recogn...

متن کامل

Semilinearity as a Syntactic Invariant

Mildly context sensitive grammar formalisms such as multi-component TAGs and linear context free rewrite systems have been introduced to capture the full complexity of natural languages. We show that, in a formal sense, Old Georgian can be taken to provide an example of a non-semilinear language. This implies that none of the aforementioned grammar formalisms is strong enough to generate this l...

متن کامل

Intonational Phonology of Georgian

This paper proposes a prosodic structure and the tonal pattern of Georgian, the national language of Georgia. The language has three prosodic units above the Word: Intonation Phrase (IP), Intermediate Phrase (ip), and Accentual Phrase (AP). All these units are marked by a boundary tone, but an AP in Georgian is unique typologically in that it has pitch accent linked to a stressed syllable and p...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007